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Abstract 

The major histocompatibility complex (MHC) plays a central role in the adaptive 
immune system and provides a good model with which to understand the evo- 
lutionary processes underlying functional genes. Trans-species polymorphism and 
orthology are both commonly found in MHC genes; however, mammalian MHC 
class I genes tend to cluster by species. Concerted evolution has the potential to 
homogenize different loci, whereas birth-and-death evolution can lead to the loss 
of orthologs; both processes result in monophyletic groups within species. Studies 
investigating the evolution of MHC class I genes have been biased toward a few 
particular taxa and model species. We present the first study of MHC class I genes 
in a species from the superfamily Musteloidea. The European badger (Meles meles) 
exhibits moderate variation in MHC class I sequences when compared to other 
carnivores. We identified seven putatively functional sequences and nine pseudo- 
genes from genomic (gDNA) and complementary (cDNA) DNA, signifying at least 
two functional class I loci. We found evidence for separate evolutionary histories 
of the ctl and o/2/ct3 domains. In the al domain, several sequences from differ- 
ent species were more closely related to each other than to sequences from the 
same species, resembling orthology or trans-species polymorphism. Balancing se- 
lection and probable recombination maintain genetic diversity in the a 1 domain, 
evidenced by the detection of positive selection and a recombination event. By 
comparison, two recombination breakpoints indicate that the a2/a3 domains have 
most likely undergone concerted evolution, where recombination has homogenized 
the a2/a3 domains between genes, leading to species-specific clusters of sequences. 
Our findings highlight the importance of analyzing MHC domains separately. 



Introduction 

The major histocompatibility complex (MHC) is of particu- 
lar importance to the study of evolutionary genetics owing to 
its pattern of molecular evolution. MHC is a diverse gene fam- 
ily that plays a crucial role in the vertebrate adaptive immune 
system and in autoimmunity. Cell surface glycoproteins, en- 
coded by the MHC genes, are vital in both humoral and 
cell-mediated immune responses, as they bind and present 
antigens to T cells and trigger an immune cascade (Swain 



1983). MHC genes are classified into groups including class I 
and class II. MHC class II molecules principally bind peptides 
from the extracellular environment and are only expressed 
on antigen-presenting cells, such as B cells and macrophages 
(Hughes and Yeager 1998). MHC class I genes comprise clas- 
sical (class la) and nonclassical (class lb) loci that differ in 
polymorphism, structure, function, and expression pattern 
(Parham and Ohta 1996; Rodgers and Cook 2005). MHC 
class la molecules are responsible primarily for intracellu- 
lar antigen binding and are expressed on the surface of all 
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nucleated somatic cells (Bjorkman and Parham 1990). Be- 
cause of this crucial role in the immune system, MHC genes 
are under constant selective pressures due to challenges from 
parasites and pathogens (Jeffrey and Bangham 2000; Piertney 
and Oliver 2006). This arms race between pathogens and 
hosts is posited to be the driving force for the extreme di- 
versity in MHC genes (e.g., class la genes such as HLA-A, 
HLA-B, and HLA-C in humans; Piertney and Oliver 2006). 

The high degree of diversity among MHC alleles arises 
from the diverse exons that encode the domains forming the 
antigen-binding site ( ABS; Hughes and Yeager 1998). The nu- 
cleotide diversity within the MHC genes has been attributed 
to balancing selection (Bergstrom and Gyllensten 1995), such 
as overdominance and frequency-dependent selection, which 
act to maintain large numbers of alleles in populations. The 
persistence of ancestral allelic diversity over long periods of 
time, relative to neutral genetic variation (Richman 2000), 
is enhanced substantially by balancing selection, which leads 
to high levels of allelic diversity within species (Hughes and 
Yeager 1998; Penn et al. 2002). Some mammalian MHC al- 
lelic lineages are more than a million years old and are main- 
tained after speciation (Figueroa et al. 1988). Phylogenetic 
reconstructions therefore reveal trans-species polymorphism 
(Klein 1987; Klein et al. 1998, 2007), where alleles between 
species are more closely related (or even identical) than al- 
leles within species. Phylogenetic reconstructions can also 
reveal orthologous relationships, by which sequences group 
by gene rather than by species, forming orthologous gene 
clusters (Nei et al. 1997; Nei and Rooney 2005). In ortholo- 
gous clusters, the diveregnce pattern of genes reflects species 
phylogeny (when trans-species polymorhism does not also 
occur) because genes have diverged from a common ancestor 
due to a speciation event (Fitch 2000). 

In contrast to class II genes, which usually exhibit ortho- 
logy and trans-species polymorphism in mammals, class I 
genes tend to form monophyletic groups within species and 
therefore it is difficult to establish orthologous relationships 
among closely related mammalian species (Takahashi et al. 
2000). One plausible explanation is that class I loci undergo 
a faster rate of birth-and-death evolution than do class II 
loci (Nei et al. 1997; Vogel et al. 1999; Takahashi et al. 2000; 
Piontkivska and Nei 2003), whereby new genes are created 
by gene duplication and some genes become nonfunctional 
through deleterious mutations, or become deleted from the 
genome. The orthologous relationship may therefore become 
lost due to birth-and-death evolution. The high divergence 
of class I genes between closely related species has also been 
posited to be due to frequent genetic exchange by recom- 
bination or gene conversion termed "concerted evolution" 
(Rada et al. 1990; Hess and Edwards 2002; Nei and Rooney 
2005). This yields genes within a species that are more sim- 
ilar to each other than they are to the orthologs in closely 
related species; that is, intraspecific paralogs are more simi- 



lar than interspecific orthologs. The homogenizing effect of 
recombination can mask orthology and duplication history 
and impede the reconstruction of the phylogeny. With higher 
rates of concerted evolution, divergence between orthologs 
increases, which makes the signal for orthology harder to 
detect. In the concerted evolution model, different loci are 
homogenized constantly by gene conversion (Rada etal. 1990; 
Nei and Rooney 2005). This tends to eliminate variation and 
generates intraspecific similarity while reducing interspecific 
similarity, thus facilitating diversification. Exchange of se- 
quence segments by gene conversion, and recombination, 
plays a prominent role in the evolution of the MHC genes 
(Parham and Ohta 1996; Jakobsen et al. 1998; Richman et al. 

2003) . In addition to its homogenization effect, it has been 
proposed that recombination creates and maintains diversity 
in the MHC genes (Holmes and Parham 1985; Jakobsen et al. 
1998). The consequent phylogenetic inconsistencies between 
different regions of the same gene, due to concerted evolu- 
tion, have been observed in the MHC of many species (e.g., 
Shum et al. 2001; Bos and Waldman 2006; Burri et al. 2008), 
including humans (Houlden et al. 1996; Jakobsen et al. 1998; 
Bos and Waldman 2006). 

To date, the domestic dog, Canis lupus familiaris, has pro- 
vided the model for MHC research in carnivores. In the dog 
leukocyte antigen (DLA; Kennedy et al. 2001; Wagner 2003), 
one class la gene is present, with three class lb genes and 
two pseudogenes. This class la gene is highly polymorphic, 
with more than 50 alleles identified (Wagner 2003). Studies 
of the MHC class I genes in many carnivores, such as those 
in the superfamily Musteloidea, are however lacking. Here, 
we characterize the MHC class I genes of the European bad- 
ger (Meles meles). Meles meles is well suited to investigate 
how MHC selection and conferred immunological advan- 
tages (e.g., pathogen and parasite resistance) are regulated by 
mate choice in the wild. Meles meles has a long mating season 
(Buesching et al. 2009), delayed implantation (Thorn et al. 

2004) , putative superfoetation (Yamaguchi et al. 2006), and 
a sensory predisposition toward olfaction (Buesching et al. 
2002 ) . In high-density populations M. meles has a polygynan- 
drous mating system (Dugdale et al. 2007, 2011) with high 
levels of extra-group paternity (Dugdale et al. 2007) and low 
fecundity (Macdonald et al. 2009). Additionally, it has been 
the subject of a diverse range of endoparasitic disease stud- 
ies (Macdonald et al. 1999; Anwar et al. 2000, 2006; Newman 
et al. 2001; Rosalino et al. 2006; Nouvellet et al. 2010; Lizundia 
et al. 2011). In particular, M. meles is a wildlife reservoir of 
Mycobacterium bovis (Delahay et al. 2001; Mathews et al. 
2006; Riordan et al. 2011); an intracellular bacteria that is 
the cause of bovine tuberculosis (bTB) in cattle and wildlife. 
MHC class I-dependent immunity is known to play an im- 
portant role in the eradication of M. bovis in mice (Ladel 
et al. 1995). Meles meles with different MHC genotypes, or 
the presence/absence of certain MHC alleles, may therefore 
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Figure 1. Schematic representation of the positions of the primers used for amplification of MHC class I sequences from cDNA/gDNA. cc1 —3 domains 
are labeled at the exons encoding them. 



have differential susceptibilities to M. bovis, which could con- 
tribute to genetic-based bTB control strategies in badgers. 

In this study we: ( 1 ) characterized the MHC class I genes of 
M. meles from a high-density population and tested for evi- 
dence of selection and recombination; (2) identified the tran- 
scription pattern by comparing genomic DNA (gDNA) and 
complementary DNA (cDNA) sequences from whole blood 
samples, which is important as MHC genes identified us- 
ing gDNA may be nonfunctional; and (3) performed phylo- 
genetic analyses to investigate whether M. meles sequences 
belong to monophyletic groups, or whether sequences tran- 
scend species boundaries. Characterization of MHC class I 
genes in M. meles will clarify whether these have a more 
rapid turnover rate than class II loci (Sin et al. 2012) and fa- 
cilitate elucidation of the underlying evolutionary processes 
within different regions. Moreover, the development of MHC 
markers will facilitate studies on the relationship between ge- 
netics and disease in this controversial animal, as well as other 
closely related species. 



Materials and Methods 

Sample collection and nucleic acid isolation 

Blood samples were collected from 1 1 badgers that resided 
in eight different social groups (Sin et al. 2012) in Wytham 
Woods, Oxfordshire, UK (global positioning system ref- 
erence 51°46'26N, T19'19W). All trapping and handling 
protocols are detailed in Macdonald and Newman (2002). 
These protocols were subject to ethical review and were 
performed under Natural England Licence (currently 
20104655) and UK Home Office Licence (PPL 30/2835). 
Approximately 3 mL of blood was taken by jugular venipunc- 
ture and collected in a vacutainer containing EDTA. Samples 
were stored at — 20°C until DNA isolation was performed. 
gDNA was isolated using the GFX Genomic Blood DNA 
Purification Kit (Amersham Biosciences, Little Chalfont, 
UK), following the scalable method in the manufacturer's 
protocol. In order to validate whether the identified alleles 
were transcribed, a 500-/xL blood sample, from each of 
the 11 individuals, was also transferred into RNAprotect 
Animal Blood Tubes (Qiagen, Hilden, Germany) and stored 
immediately at — 20°C for less than a month before RNA 
isolation. Total cellular RNA was isolated from each blood 
sample using an RNeasy Protect Animal Blood Kit (Qiagen). 
Methods of cDNA synthesis are detailed in Sin et al. (2012). 



Primer design and polymerase chain reaction 
(PCR) amplification 

MHC class I molecules are heterodimers, that consist of an 
a chain and a ^-microglobulin (/62m) molecule. The a chain 
is composed of three extracellular domains al, a2, and «3 
(encoded by exon 2, 3, and 4, respectively), a transmembrane 
domain (exon 5), and a cytoplasmic domain (exon 6; Fig. 1; 
Bjorkman and Parham 1990). The al and al domains are a 
chain regions that comprise specific sites forming the ABS. 

To amplify the class I genes from M. meles, we tested 
published primers used successfully in other carnivores 
(Zhong et al. 1998; Aldridge et al. 2006). Oligonucleotide 
primers, which recognize highly conserved regions of the 
MHC class I genes, were also designed using OligoAn- 
alyzer 3.1 (Owczarzy et al. 2008), based on alignments 
with GenBank's nucleotide sequences from domestic dog 
(Canis lupus familiaris; AF2 18297, AF2 18299, AF2 18301, 
AF218303, DQ056267, DQ056268, and M32283), harbour 
seal (Phoca vitulina; U88874), domestic cat (Felis catus; 
M26318, U07670, U07672 and U07674), horse (Equus cabal- 
lus; NM001 123381), and human (Homo sapiens; AF287959, 
U03907, NM002117, and NM002127). 

Using these primers on 10-30 ng of cDNA/gDNA, PCR 
amplification was performed in a 20-/xL reaction mix that 
also contained 0.5 /xM of each primer (Table 1), 200 /xM of 
each dNTP, 1 x PCR buffer (containing MgCk; Qiagen), and 
2 units of HotStarTaq (Qiagen). The PCR cycle began with 
incubation at 94° C for 15 min, followed by 35 incubation 
cycles at 94° C for 30 sec, annealing temperature (Table 1) for 
30 sec, and 72°C for 60-90 sec according to amplicon length 
(60 sec for amplification of exon 2 or exon 3 only), ending 
with an extension step at 72° C for 10 min. The PCR products 
were electrophoresed on a 1.5% agarose gel and visualized 
using ultraviolet light and ethidium bromide staining. A 100- 
bp DNA ladder (New England Biolabs, Herts, UK) was used to 
size the DNA fragments. Bands of expected size were excised 
from the gel and purified using QIAquick Gel Extraction Kits 
(Qiagen). PCR products that gave rise to relatively bright 
bands of the expected size were cloned and sequenced. 

Cloning and DNA sequencing 

Purified PCR fragments were cloned using the pGEM-T Easy 
Vector Systems (Promega, Madison, WI). The cloning and 
sequencing procedure is detailed in Sin et al. (2012). Be- 
tween nine and 72 clones were sequenced for each individual. 
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Table 1. MHC class l-specific primers for Meles meles. 



Primer name 


Primer sequence 


Product size (bp): region amplified 


T. CO 


Source reference 


F: Meme-MHCIex1F 


GGCCCTGGCCGTGACC 


1012 bp; exon 1-exon 6 


59 


This study 


R: Meme-MHCI-ex6R 


ATCAGAGCCCTGGGCACTGTC 






This study 


F: Meme-MHCIex2F 


GGCTCCCACTCCCTGAGG 


543 bp: exon 2-exon 3 on cDNA 


59 


This study 


R: Meme-MHCIex3R 


GCGCAGCAGCGACTCCTT 


731-784 bp: exon 2-exon 3 on gDNA 




This study 


R: PpLAa1L250 


GGCCTCGCTCTGGTTGTAG 


270 bp; exon 2 


55 


Aldridgeetal. (2006) 


F: Meme-MHCIex3F 


GGGTCTCACACCATCCAG 


(Pairs with Meme-MHCIex3R) 273 bp; exon 3 


59 


This study 



F forward; R, reverse; bp, base pair; T„ annealing temperature. 

Identical sequences were derived from a minimum of two 
badgers or from independent PCR reactions from the same 
individual, in compliance with DLA nomenclature rules 
(Kennedy et al. 1999). Single unique sequences (possible 
chimeras) were excluded. Nucleotide sequences were ana- 
lyzed using CodonCode Aligner 3.7.1 (CodonCode Corpo- 
ration, Dedham, MA) and were compared with known MHC 
class I sequences using the NCBI BLAST program (Altschul 
et al. 1990). The DNA sequences from M. meles were assigned 
the GenBank accession numbers JQ425427-JQ425447. 

Data Analyses 

Selection and recombination 

Selection at the amino acid level was measured as the rates 
of nonsynonymous (<i N ) and synonymous (d$) substitutions 
per codon site, estimated in DnaSP 4.0 (Rozas et al. 2003) 
and MEGA 4 (Tamura et al. 2007) according to the method 
of Nei and Gojobori (1986), with Jukes and Cantor (1969) 
correction. Standard errors were derived from 1000 boot- 
strap replicates. Synonymous and nonsynonymous substitu- 
tions were calculated separately for the ABS and non-ABS, as 
determined by Bjorkman et al. (1987). CODEML in PAML 
4.4b (Yang 2007) was used to check for positively selected sites 
(PSS) in the a 1 and al domains, which are indicated where 
the ratio co (nonsynonymous/synonymous substitution rate 
ratio, d^/ds) exceeds 1, meaning that nucleotide mutations 
that alter the amino acid sequence of a protein occur more 
frequently than nucleotide mutations that do not alter amino 
acids. Thus, co > 1 is indicative of positive selection whereby 
beneficial amino acid changes are fixed. Codon-based like- 
lihood analysis was used to test for evidence of positive 
selection, using several models: Mia (nearly neutral), M2a 
(positive selection), M7 (beta), and M8 (beta and co). The 
assumptions of these models are detailed in Yang et al. (2000, 
2005). Two null models of neutral evolution (Mia and M7) 
were applied and compared against their nested models, 
which allow stringent testing for positive selection (M2a and 
M8, respectively, which assume a different distribution of 
mutations; Anisimova et al. 2001, 2002). We determined 
whether the alternative models (M2a and M8) provided a 
significantly improved fit, versus their null models (Mia and 



M7, respectively), using a likelihood ratio test (LRT), which 
compares twice the difference of the log likelihood ratios 
(2AInL) to a / 2 distribution. To identify codons under pos- 
itive selection (posterior probabilities >0.95), through com- 
parisons of Mia versus M2a and M7 versus M8, CODEML 
was used to calculate the Bayes Empirical Bayes (BEB) poste- 
rior probabilities (Yang et al. 2005) at each codon. Different 
domains were analyzed separately, as their evolutionary his- 
tories could be different due to recombination events in the 
intervening intron. 

Recombination analyses were performed on the nucleotide 
alignment spanning exon 2, intron 2, and exon 3 using RDP3 
alpha 44 (Martin et al. 2010). Four methods, that is, RDP 
(Martin and Rybicki 2000), GENECONV (Padidam et al. 
1999), MaxChi (Smith 1992), and Bootscan (Martin et al. 
2005), were applied in the first run to detect recombina- 
tion events. Default settings were applied with a maximum 
P-value of 0.05, applying Bonferroni correction for multi- 
ple comparisons. Any recombination signals, which were de- 
tected by at least three methods, were then rechecked with all 
available methods (Martin et al. 2010). Any putative recombi- 
nation events detected were verified further by examination 
of MaxChi plots and matrices and Neighbor-Joining trees 
from the inferred fragments, to assess recombinant desig- 
nation and breakpoint placement. Only the recombination 
events that were confirmed after these procedures were con- 
sidered significant. The effects of recombination and gene 
conversion on sequence evolution are similar in small se- 
quence fragments, therefore we did not differentiate between 
them, and we refer to them as recombination sensu lato here- 
after (Richman et al. 2003; Burri et al. 2008). 

Phylogenetic Analyses 

Phylogenetic analyses were performed on the consensus 
alignments of M. meles MHC class I exon 2, exon 3, and 
exon 4 sequences, against sequences from other species avail- 
able in GenBank (e.g., C. lupus familiaris [accession number: 
M32283, NM001014378, NM001014379, NM001014767, 
NM001020810, U55029], P. vitulina [PVU88874], F. catus 
[U07672, U07674], E. caballus [LOC100056062], H. sapiens 
[AF287959, NM002117.4, NM002127.5, U03907], Ail- 
uropoda melanoleuca [EU162658, EU162659], Leopardus 
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partialis [U07678], Acinonyx jubatus [U07666], Bos taurus 
[AB245424], and Monachus schauinslandi [Aldridge et al. 
2006]). Tree reconstruction was performed separately on 
three domains, in order to maximize the detection of differ- 
ent evolutionary histories due to genetic exchange. Domain 
borders were assigned according to Koller and Orr (1985). 

Phylogenetic networks allow visualization of reticulate 
phylogenetic signals (Huson and Bryant 2006) whereas 
phylogenetics trees may poorly describe complex evolution- 
ary scenarios; thus, phylogenetic networks provide an effec- 
tive way of evaluating evolutionary relationships involving 
gene duplication and recombination. We used the Neighbor- 
Net algorithm in SplitsTree 4.12.3 (Huson and Bryant 2006) 
to analyze the phylogenetic relationships for exon 2, exon 3, 
and intron 2. We constructed a Neighbor-Net network based 
on uncorrected p-distances and conducted 1000 bootstrap 
replicates to estimate their support. 

Bayesian phylogenetic inference was performed using 
MrBayes 3.1.2 (Ronquist and Huelsenbeck 2003) to ana- 
lyze the phylogenetic relationship of exon 4. We used the 
web-based application FindModel (http://www.hiv.lanl.gov/ 
content/sequence/findmodel/findmodel.html) to ascertain 
the best-fit model of nucleotide substitution, which was iden- 
tified as the Hasegawa-Kishino-Yano plus gamma model 
(HKY+T). A Markov chain Monte Carlo (MCMC) search 
was initiated with random trees and run for 3,000,000 gener- 
ations, sampling every 100 generations. Data were partitioned 
by gene codon positions. Two separate analyses and four in- 
dependent chains were executed. Convergence was indicated 
when the average standard deviation of split frequencies was 
less than 0.01 (Ronquist et al. 2005). We also checked for con- 
vergence by plotting the likelihood scores against generations 
and discarded the first 25% of the generations as "burn-in." 

Results 

Diversity and transcription of MHC class I 
sequences 

Sixteen different sequences were isolated from gDNA and 
cDNA, using the primers detailed in Table 1, Figure 1, and 
Table SLA maximum of four putatively functional sequences 
was derived from a single individual, indicating the pres- 
ence of at least two loci. Transcription analysis showed that 
nine of 16 sequences detected in the gDNA were also ampli- 
fied from the cDNA, which was isolated from whole blood 
(Fig. 2; Table 2), while four sequences were detected only in 
the gDNA (Fig. 2). 

Pseudogene features (Fig. 2) were found in four se- 
quences: nucleotide deletions caused a frameshift for Meme- 
MHC I* PS01, Meme-MHCI*PS02, Meme-MHC I* PS03, and 
Meme-MHC I* PS04N , which caused premature stop codons 
(Fig. 2; PS signifies pseudogenes with a frameshift, N sig- 
nifies the presence of genomic sequences, where no cDNA 



sequences were found). Pseudogene Meme-MHC I*PS04N 
was detected only in gDNA, whereas Meme-MHC I*PS01, 
Meme-MHC PPS02, and Meme-MHC PPS03 were only de- 
tected in the cDNA, indicating the presence of a transcribed 
nonfunctional pseudogene (Mayer et al. 1993; Fernandez- 
Soria et al. 1998). The nucleotide deletion in Meme-MHC 
7*PS01 and Meme-MHC I*PS03 occurred at the 5' primer 
annealing site for exon 2 amplification from gDNA; thus, no 
sequence was detected from gDNA. Nucleotide insertions 
or deletions were detected in two of four sequences that 
only amplified in the exon 2 region (Meme-MHC I*09N, 
Meme-MHC I*PS10, Meme-MHC 1*11, and Meme-MHC 
PPS12N; Fig. 2). The intron 2 of Meme-MHC I* PS08N is 
45-56 bp longer than other putatively functional sequences; a 
longer intron was also found in a DLA pseudogene (DLA-53; 
Wagner 2003; Figs. 3 and 4C). Accordingly, the above se- 
quences were regarded as pseudogenes and were not included 
in the nucleotide substitution calculations. Only sequences 
without pseudogene features, and with both exon 2 and exon 
3 amplified from both gDNA and cDNA, were regarded 
as functional sequences and included in further analyses 
of nucleotide substitution and selection (i.e., Meme-MHC 
I*01-Meme-MHCI*07; Table 2; Fig. 2). 

Of the seven putatively functional sequences, all were de- 
tected in both gDNA and cDNA (Table 2; Fig. 2). Meme-MHC 
I* 05 was detected in all individuals. By contrast, Meme-MHC 
I* 01 and Meme-MHC I* 02 were identified and expressed in 
only one individual in this study ( Table 2 ). One thousand one 
hundred and fifty-three individuals from the same popula- 
tion have been genotyped subsequently and these sequences 
have been identified in more individuals (330 for Meme- 
MHC 1*01 and 132 for Meme-MHC I* 02; Y. W. Sin, unpubl. 
data). 

The majority of variable sites in the identified sequences 
were in exon 2 and exon 3, where most mutations repre- 
sented a nonsynonymous nucleotide substitution (Table 3). 
As a consequence, there were more polymorphic amino acid 
residues among the al and al domains (Fig. 2; Table 3), 
which are involved in antigen binding. The average sequence 
divergence was 6.5% (SE = 0.62%); 7.0% (SE = 0.84%); 
and 1.0% (SE = 0.25%) in exon 2, exon 3, and exon 4, re- 
spectively. Two of the sequences, Meme-MHC 1*02 and 04, 
were highly similar, differing from each other at only one 
nucleotide position in exon 2 (Fig. 4A) and five positions 
in exon 3 (Fig. 4B). Meme-MHC 1*03 was highly divergent 
(9.3-9.6%) from Meme-MHC 1*01, 02 and 04; Meme-MHC 
1*07 was highly divergent (9.3-9.6%) from Meme-MHC 
1*02 and 04, in the exon 2 region. In the exon 3 region, 
Meme-MHC 1*05 and 06 were highly divergent from the 
other five sequences (8.8—12.3%). The intron 2 region of 
Meme-MHC 1*05 and 06 was identical (Figs. 3 and 4C) yet 
both of them were highly divergent from other putatively 
functional sequences (28.3—30.1%). 
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Figure 2. Amino acid sequence identity for class I sequences of Meles meles and seven other mammals (Phoca vitulina, Ailuropoda melanoleuca, 
Canis lupus familiaris, Felis catus, Acinonyx jubatus, Equus caballus, and Homo sapiens; GenBank accession numbers are provided in the Materials 
and Methods). The complete amino acid sequence of Meme-MHC l"03 is shown. PS signifies pseudogenes with a frameshift. N signifies the presence 
of genomic sequences, where no cDNA sequences were found. Single letters and dots within the alignment/sequence represent amino acids that are 
distinct from or identical to Meme-MHC 1*03, respectively. Dashes (-) indicate missing sequences. Numbers above the sequence indicate the codon 
position. Arrows above the sequence label the beginning of a domain. Asterisks (*) indicate amino acid residues pointing toward the postulated 
antigen-binding site, which were defined according to Bjorkman et al. (1987). Carets ( A ) indicate residues pointing up on an alpha-helix, postulated 
to interact with peptides and/or T-cell receptors (TCRs), and dots above the sequence (.) indicate residues on an alpha-helix that is pointing away 
from the antigen-binding site, postulated to interact with TCRs (Bjorkman et al. 1987). Conserved sites that bind the peptide N- and C-termini are 
marked with gray boxes. The location of the N-linked glycosylation site is at position 88 (-CH0). Disulphide bonds formed between cysteine residues 
are shown with a line spanning the two cysteine residues. Residues that form the /i-sheet or a-helix, and residues that influence the binding of the 
CD8 glycoprotein, are marked under the alignment. The columns on the left of the sequences indicate whether the sequences were found in gDNA 
and/or cDNA. 
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Table 2. Presence of MHC class I sequences in genomic DNA (gDNA) and complementary DNA (cDNA) from 1 1 Meles meles. The total number 
of clones with different sequences found is given. Total number of clones: numbers out of brackets are clones of polymerase chain reaction (PCR) 
products from gDNA (numbers in brackets are clones of PCR products from cDNA). 

Individual 

Sequence 1 2 3 4 5 6' T 8' 9 2 10 2 1 1 2 All Total no. of clones 



Meme-MHCI'01 B 3(2) 

Meme-MHC I 02 H| ^| 8 (5) 

Meme-MHC 103 B X X B| 3(27) 

Meme-MHC 1*04 X B_ X H_ H X X H 7(72) 

Meme-MHC 1*05 B B B B B XXX B 88(49) 

Meme-MHC 1*06 B X B 12(14) 

Meme-MHC 1*07 B X B 5(26) 



X represents sequence from cDNA. Blank represents no detected sequence. Gray shading indicates sequence from gDNA. 
'gDNA was not sequenced. 
2 cDNA was not sequenced. 



Nine amino acid residues (three in the al domain and 
six in the c/2 domain) are located at the two ends of the 
binding groove of the MHC molecule, where these residues 
bind the N- and C-termini of the presented peptide; these 
residues are highly conserved across vertebrates (Y7, Y59, 
Y84, Y123, T143, K146, W147, Y159, and Y171; Kaufman 
et al. 1994; see also Shum et al. 1999; Mesa et al. 2004). In 
M. meles, all of these residues except one were conserved 
(Fig. 2). Residue T143 is polymorphic in M. meles and four 
sequences have the common threonine (T), which is the 
same amino acid seen in six of seven species included in the 
sequence alignment (Fig. 2). The seventh species, E. caballus, 
has a serine (S) instead, which is the same as in the other 
three M. meles sequences. 

The M. meles sequences also coded for other conserved 
residues, characteristic of class I molecules, which are im- 
portant for the molecular structure and peptide binding. 



These include four cysteine (C) residues (positions 103, 167, 
206, and 262; Fig. 2) that form the intradomain disulphide 
bonds, an N-linked glycosylation site in the a 1 domain (po- 
sition 88; Fig. 2); and a threonine (T) residue at position 
136 (Fig. 2) that is critical for interactions with T-cell anti- 
gen processing (TAP) complexes and endogenous peptide 
loading (Peace-Brewer et al. 1996). Two regions in the a3 
domain (position 222-232 and 248-259; Fig. 2), bearing 
many negatively charged residues, interact with the positively 
charged side chains of the CD8 glycoprotein on the T cells 
(Kaufman et al. 1994). Twenty of the 23 residues in these 
two regions are conserved in the M. meles sequences, ex- 
cept that glutamic acid (E224) replaces glycine (G) in Meme- 
MHC I* 03, Meme-MHC I* 04, and Meme-MHC I* 07, leucine 
(L227) replaces glutamine (Q) in all sequences, and glutamine 
(Q256) replaces glutamic acid (E) in all sequences. The glu- 
tamine (Q229) residue (Fig. 2), which is important for CD8 



1 29 40 M SO 100 * 120 

KflM MHC I -0 3 GC^AGCCACA-CrfWCCCCGCTCCAarrcACGAACCCCATTATTC^ CT-CC-TTAAGCCCGAAGAACCCC 

H«m KHC1-04 - C C... C TC.C..T..C A 

M«M HHCI-05 C OCC.ACC . T. . CT .CCACGCC . TA. C . AC.T.T- ■ . G . TCCCAGCC. . A. C . TCAGC. 7G . C . GA . ■ . ACCA- .A. -CC. . .C C..C-.A 

MMMKMC1-06 - C. CCC.ACG. T. .CT.CCAGCGC.TA.C.AG.T.T. . .G.7CCGAGCG. . A.C.TGAGG. TG.G.GA. . .ACCA. .A. .CC.. .C C. .G-.A 

Hmm KHCI-07 - C C C C.T..fO A.-. . - 

H4NM HHCI-PSOIN - C CCC.ACCG.CG. . CCGG . GTCGCC . C . GA . T .T .C . G . 7ATGAGCG . . A . CGTGAGG . TG . G . GA . . . ACTT . ,G . -CAG.CC.G G.C. . 

C. luptIS C«BilUrlS DLMI .G.GC G..G..G C CCC. .OGACCG . CC.GG . . GC. CCG . GGGTCCGGGTCC - . G . GTCA . .COM. • . GAGG . . CC . G . . . CTCCCC . TCG . CCC . GTCCC 
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Figure 3. Nucleotide sequence identity for the MHC class I intron 2 of Meles meles clones and C. lupus familiaris. The GenBank accession numbers for 
sequences from C. lupus familiaris are NW876254.1 and U55029.1 . The complete nucleotide sequence of Meme-MHC 1*03 is shown; numbers above 
the sequence indicate the nucleotide position. PS signifies pseudogenes with a frameshift. W signifies the presence of genomic sequences, where no 
cDNA sequences were found. Single letters and dots (.) represent nucleotides that are distinct from or identical to Meme-MHC 1*03, respectively. Dashes 
(-) indicate missing sequences. The degenerate 13-bp sequence motif (CCNCCNTNNCCNC) that is crucial in crossover events at human recombination 
hotspots (Myers et al. 2008) is marked with gray boxes. The nucleotide breakpoint for Meme-MHC 1*05 and Meme-MHC 1*06 is marked with an 
asterisk (*) above the alignment. 
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Table 3 Sequence polymorphism of MHC class I genes/molecules de- 
lineated by exon/domain (leader, al, al, a3, transmembrane, and cyto- 
plasmic domains encoded by exon 1, 2, 3, 4, 5, and 6, respectively). The 
number of nucleotide and derived amino acid sequences isolated from 
1 1 Meles meles were compared and the numbers of synonymous and 
nonsynonymous nucleotide substitutions and polymorphic amino acid 
residues are shown. 



Exons 


V 


2 


3 


4 


5 6 1 


Sequences for comparison 


5 


7 


7 


5 


5 S 


Variable sites 


0 


37 


44 


5 


7 2 


Mutations 


0 


39 


47 


5 


7 2 


Synonymous 


0 


7 


14 


2 


1 1 


Nonsynonymous 


0 


28 


26 


3 


6 1 


No. of amino acids 


4 


90 


92 


92 


29 18 


Polymorphic amino acid residues 


0 


23 


24 


3 


6 1 



'Only part of the exon is included in this study. 



accessory functions (Durairaj et al. 2003), is conserved here 
and in other mammalian, avian, and reptilian sequences 
(Kaufman et al. 1994). 

Selection and recombination 

Rates of nonsynonymous substitutions were higher than 
synonymous substitutions in the ABS in both the al and 
a2 domains and the non-ABS of the al domain (Fig. 5). 
Synonymous substitutions, however, were higher than non- 
synonymous substitutions in the a 3 domain (which has no 
ABS) and the non-ABS in the a2 domain. PAML models M2a 
(positive selection) and M8 (beta and u>), which permit pos- 
itive selection at a subset of codon sites, gave a significantly 
better fit than did the models without positive selection in 
the al domain (Table 4). That is, positive selection signals 
were detected in the a 1 domain. In the a 2 domain, however, 
models that allow for positive selection did not provide a 
significantly better fit than the models of neutral evolution 
(Table 4). Parameter estimates (Table 4) indicate that 3.8% 
(under both M2a and M8) of a 1 amino acid sites are un- 
der positive selection with u> = 13.7 (M2a) and u> = 14.4 
(M8). Two sites were identified as being under positive selec- 
tion (Table 4; Fig. 6); both PSS were within the ABS (Fig. 6; 
Kaufman etal. 1994). 

One significant recombination event was detected in 
M. meles class I sequences, indicating that both Meme-MHC 
1*05 and Meme-MHC 1*06 were likely to have originated 
from Meme-MHC PPS08N and Meme-MHC 1*04. The re- 
combination signals were significant for RDP, GENECONV, 
BootScan, SiScan (all P < 0.001), MaxChi, and Chimera 
(both P < 0.01). Two recombination breakpoints were de- 
termined of which one was situated in exon 2 (nucleotide 
position 216 from the beginning of exon 2; MaxChi: P < 
0.001) and one in intron 2 (nucleotide position 382; Max- 
Chi: P < 0.001; Fig. 3). 



Phylogenetic analyses 

The phylogenetic tree of al (Fig. 4A) highlights that se- 
quences from M. meles did not form a monophyletic clade, 
rather they were intermingled with phocine, canine, and 
ursine sequences. No sequences from the other species in- 
cluded in the analysis formed a monophyletic clade, except 
for those from H. sapiens and M. schauinslandi. But the sup- 
port of clades with mixed sequences from difference species 
was not strong (e.g., bootstrap value = 56 for the cluster- 
ing of Aime-128 and Meme-MHC 1*01, 02, and 04). Some 
M. meles pseudogenes, however, formed highly or mod- 
erately supported clades without grouping with any puta- 
tively functional sequences (e.g., Meme-MHC I*PS02 and 
Meme-MHC I*PS03; Meme-MHC FPS08N, Meme-MHC 
I*09N, and Meme-MHC I* PS 10). For the sequences en- 
coding the a 2 domain (Fig. 4B), all M. meles sequences, 
except for Meme-MHC I*PS08N, formed a monophyletic 
clade. Within this clade, pseudogenes Meme-MHC I* PS04N , 
Meme-MHC I*PS02, and Meme-MHC I*PS03 grouped to- 
gether. Sequences from A. melanoleuca and H. sapiens also 
formed monophyletic clades. For a3 sequences, M. meles 
formed a distinct and highly supported clade (posterior prob- 
ability [PP] support = 1.0; Fig. 7) that grouped together with 
P. vitulina and A. melanoleuca (PP = 0.67). This clade fur- 
ther grouped with all carnivores in our comparison including 
canine and feline sequences (PP = 0.86), separating it from 
equine, bovine, and human sequences. 

Discussion 

Diversity and transcription of MHC class I 
sequences 

This is the first study to characterize MHC class I genes in 
M. meles. Moreover, this work was performed using both 
the genome and transcriptome. The sequences identified 
included pseudogenes and putatively functional sequences 
that encode domains necessary to form functional class I 
molecules. The major structural features (Kaufman et al. 
1994) that distinguish class la molecules are all present 
in these putatively functional sequences, including highly 
conserved amino acid residues that bind the N- and C- 
termini of the peptide, cysteine (C) residues that form the 
disulphide bonds, an N-linked glycosylation site, a threonine 
(T) residue for interaction with TAP complex, and regions to 
interact with CD8 glycoproteins on the T cells. In addition, 
we demonstrate that all putatively functional sequences were 
expressed in the RNA level in whole blood. These features, to- 
gether with the detected polymorphisms, indicate that these 
sequences belong to at least two class la loci, in contrast to 
class lb genes that are typically monomorphic (Rodgers and 
Cook 2005). 
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Figure 4. Neighbor-Net networks of MHC class I (A) exon 2/a1 -domain, (B) exon 3/a2-domain, and (C) intron 2 sequences from Meles meles and 
other mammalian species including phocine, ursine, canine, feline, equine, and human (GenBank accession numbers are provided in the Materials 
and Methods). Meles meles class I sequences are marked with gray boxes. PS signifies pseudogenes with a frameshift. N signifies the presence of 
genomic sequences, where no cDNA sequences were found. Only bootstrap support values above 50 are shown; the central webbing indicates loops 
that have conflicting support for alternative branching events. Inclusion of 45 DLA-88 alleles (i.e., different major allelic types, Kennedy et al. 1999; 
2001) formed a distinct clade comprising DLA-88, 12, 53, and A9, as shown in Figure 4; therefore, only one allele for DLA-88 is shown. 



The variability in the number of class I sequences in 
M. meles is intermediate compared to other carnivores. The 
most closely related species for which class I genes have been 
characterized is the Hawaiian monk seal (M. schauinslandi; 
Aldridge et al. 2006), which has at least two loci. No vari- 
ability was found, however, within more than 80 individ- 
uals of this endangered seal species. Within the infraorder 
Arctoidea, the giant panda (A. melanoleuca) has two classical 
genes (Aime-152 and Aime-128; Pan et al. 2008), plus a non- 
classical gene closely related to DLA-79. Aime-152 appears 
to be monomorphic, while nine Aime-128 sequences were 
found in five individuals (Pan et al. 2008). The carnivores 



that, to date, have had their MHC organization characterized 
most thoroughly are C. lupus familiaris and P. catus. In the 
feline leukocyte antigen (FLA), there are at least three class 
la genes, in common with human and murine MHCs (Yuhki 
et al. 2007). In DLA (Burnett et al. 1997; Kennedy et al. 1999; 
Wagner et al. 1999; Wagner 2003), there is one class la gene 
(DLA-88), three class lb genes (DLA-79, -12, and -64), and 
two pseudogenes (DLA-53 and -12a). Among these, DLA- 
88 is the most polymorphic gene, with more than 50 alleles 
identified (Wagner 2003). As discussed above, the number 
of class I genes and their organization can differ between 
species greatly (Kelley et al. 2005). Mammals usually possess 
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1—3 class I genes (Nei et al. 1997) and have a variable range 
of polymorphism (Parham and Ohta 1996; Wagner 2003). 
Considering the extensive geographical range of M. meles 
and their considerable socio-spatial variability (Macdonald 
et al. 2004; Rosalino et al. 2004; Newman et al. 2011), it is 
highly likley that more sequences will be detected as and when 
other populations are examined. 



Our transcription analysis demonstrated that not all the 
detected sequences were expressed in whole blood. In addi- 
tion to the seven putatively functional sequences, for which 
the sequences detected from gDNA and cDNA were identical; 
nine sequences were identified as pseudogenes. The phylo- 
genetic analyses indicate that Meme-MHC I*PS02, Meme- 
MHC I*PS03, and Meme-MHC I*PS04N, which form a 
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gray bar) for antigen-binding site (ABS), non-ABS, and combined (ABS + non-ABS) at the three a domains of the Meles meles MHC class I loci. The 
number of codons for each region is given under the x-axis. Asterisk (*) indicates no ABS in a3 domain. 



strongly supported clade, belong to the same pseudogene 
locus, with Meme-MHC I*PS02 and Meme-MHC I*PS03 
both present in two individuals and Meme-MHC PPS04N 
detected in another two individuals. Another grouping of 
Meme-MHC FPS08N, Meme-MHC P09N, and Meme- 
MHC I* PS 10 indicated that these belong to two closely related 
pseudogene loci, as 1 — 3 sequences of this group were detected 
from seven individuals separately. Other studies have shown 
that many MHC sequences can be detected at the genomic 
level, but not at the cDNA level (de Groot et al. 2004). The 
presence of expressed nonfunctional MHC pseudogenes has 
also been reported (Mayer et al. 1993; Fernandez-Soria et al. 
1998), even in the class II genes of M. meles (Sin et al. 2012). 
This is concordant with the finding that the MHC class I and 
class II regions have duplicated many times, generating many 
pseudogenes in addition to novel functional genes (Beck et 
al. 1999). 

Evolution of the MHC class I genes 

Our phylogenetic analyses reveal the evolutionary histories 
of the class I domains we examined to be very different from 
each other. Within the al domain, there were more non- 
synonymous than synonymous nucleotide substitutions in 
both the ABS and non-ABS, contributing to a higher nu- 
cleotide and amino acid sequence diversity. Evidence of pos- 
itive selection was detected in the a 1 domain, in which PSS 



were all within the ABS (Fig. 6A). Meles meles sequences of 
the al domain were not monophyletic within this species 
but intermingled with sequences from other species (e.g., P. 
vitulina); a phenomenon characteristic of MHC genes. This 
could be due to trans-species polymorphism (Klein 1987; 
Hughes and Yeager 1998; Klein et al. 1998), whereby balanc- 
ing selection maintains this ancestral variation over a long 
period of time (many generations), even after species diver- 
gence (Penn and Potts 1999; Bernatchez and Landry 2003), 
leading to differences between gene tree and species tree. As 
the a 1 domain is responsible for antigen binding, the diver- 
sifying and balancing selection that drives and maintains this 
ABS polymorphism permits a population to present a wider 
repertoire of antigens, thus increasing its ability to combat 
pathogenic and parasitic infections (Hughes and Nei 1992; 
Hughes and Yeager 1998). The clustering of sequences among 
species could also be due to orthology, whereby sequences 
from an orthologous gene will cluster together, which may 
produce a similar clustering pattern as that observed with 
trans-species polymorphism. Without the availability of in- 
formation about loci identity, it was not possible to disen- 
tangle between signals for orthology and trans-species poly- 
morphism. Recombination between exons that encode ABSs 
can also increase allelic diversity in MHC genes (Ohta 1991; 
Jakobsen et al. 1998; Shum et al. 2001; Bos and Waldman 
2006). We detected a recombination event at the 3' end of 
exon 2 that may have increased variation in M. meles class I 
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genes. Nevertheless, point mutation and selection have been 
proposed as the major reason for high diversity in the MHC 
(Nei et al. 1997; Piontkivska and Nei 2003; Nei and Rooney 
2005). 

As is the case for al domain, the a 2 domain functions 
as an antigen-binding domain and showed a higher non- 
synonymous/synonymous rate ratio (w) within the ABS than 
in the non-ABS (Fig. 6B). Comparisons of maximum like- 
lihood models, which allow for positive selection (M2a and 
M8) relative to their corresponding null models, detected no 
significant positive selection acting on this region, however. 
A possible explanation for less intense diversifying selection 
on the a 2 domain would be that it has more conserved sites 
than the al domain and so purifying selection may act to 
maintain the structural features of the MHC molecule. Our 
phylogenetic analyses also showed that all M. meles al do- 
main sequences were grouped together except for one pseu- 
dogene (Meme-MHC I*PS08N clustering with DLA-64 but 
with low bootstrap support [72.4]; Fig. 4B); a very different 
pattern to the pattern that we observed in the a 1 domain. 
The birth-and-death model of evolution predicts that this 
type of clustering by species (Nei et al. 1997; Piontkivska and 
Nei 2003) will result in less orthologous relationships, due 
to higher rates of gene duplication and deletion. Mechanisti- 
cally, a differential rate of birth-and-death evolution explains 
the more rapid turnover rate of class I genes than class II 
genes in mammals (Hughes and Nei 1989; Piontkivska and 
Nei 2003) and possibly reveals why sequences are more closely 
related within M. meles than to sequences from other species. 
The al sequences we observed, however, showed that an- 
cestral polymorphism or orthology at multiple lineages was 
maintained after the analyzed species (Fig. 4A) diverged from 
one another, which indicates that different selective pressures 
act on different class I gene regions. Under this birth-and- 
death model, balancing selection at the a 1 domain should 
maintain ancestral polymorphism even after species diver- 
gence, while strong purifying selection and recent duplica- 
tion are required to produce a species-specific clustering at 
the a2 domain. 

Alternatively, and more plausibly, species-specific gene 
clusters may result from concerted evolution (Hess and 
Edwards 2002; Joly and Rouillon 2006), where genes are 
homogenized by recombination. Recombination in the a2 
coding region could produce M. meles sequences that are 
more similar to one another than to the sequences from other 
species. Under this scenario, a recombination breakpoint is 
needed to separate selection on the al and a2 regions, given 
that they exhibit different evolutionary histories. Our recom- 
bination analysis exposed that there are indeed recombina- 
tion breakpoints located in the middle of the a -helix of the a 1 
domain and in the middle of intron 2, which is between the 
al and a 2 coding exons. These recombination breakpoints 
indicate the probable crossover location and they most likely 
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antigen-binding sites that were defined according to Bjorkman et al. (1 987). The dotted line indicates co = 1 . Error bars indicate the standard error of 
the mean. Arrows indicate significant positively selected sites identified by the Bayes Empirical Bayes procedure (P > 0.95). 



contributed to the independent evolutionary histories of the 
a 1 and a2 domains by recombination in the a2 encoding re- 
gion. Recombination within intron 2 has also been reported 
in other species (Holmes and Parham 1985; Shum et al. 2001; 



Bos and Waldman 2006), which gives greater support to the 
hypothesis of concerted evolution through shuffling of the 
al and a 2 domains. These recombination breakpoints are 
shared among some M. meles sequences, demonstrating that 
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Figure 7. Phylogenetic tree of MHC class I exon 4/a3-domain sequences from Meles meles and other mammalian species (GenBank accession 
numbers are provided in the Materials and Methods), based on the 50% majority rule tree from the Bayesian analysis. Bayesian posterior probabilities 
above 50% are shown above the branches. Meles meles class I sequences are marked with gray boxes. PS signifies pseudogenes with a frameshift. 
N signifies the presence of genomic sequences, where no cDNA sequences were found. 



breakpoints are not random and that these recombination 
events are due to in vivo recombination but not in vitro 
chimera formation. Coincidentally, the degenerate 13-bp se- 
quence motif, which is crucial in the crossover event of hu- 
man recombination hotspots (Myers et al. 2008), was also 
found in the M. meles intron 2 sequences in which the re- 
combination breakpoints were identified (Fig. 3). Variation 
in the zinc-finger protein PRDM9, however, affects species- 
specific binding to the sequence motif (Myers et al. 2010); 
hence, more studies are needed to elucidate the recombina- 
tion mechanism in M. meles. 

Phylogenetic relationships of the a 3 encoding sequences 
were more similar to that of the a2 than the al do- 
main, demonstrating species-specific clustering of M. meles 
a3 sequences. This indicates that a recombination event 
most plausibly separated the al domain and the remain- 
ing 3' segments of the gene, leading to inconsistencies be- 



tween the al gene tree and the a2 and a3 gene trees. 
However, the a3 domain is not antigen binding and it 
shows a very low number of mutations and polymorphic 
amino acid residues. Compared with the antigen-binding 
domains that are under positive selection, which increases 
diversity, strong purifying selection could eliminate muta- 
tions leading to unfavorable changes in the molecular struc- 
ture or the regions that interact with T cells (Kaufman 
et al. 1994). The relatively conserved a3 domain is thus 
used to reconstruct evolutionary relationships more fre- 
quently than the al and a2 domains (e.g., Glaberman and 
Caccone 2008). Here, the phylogenetic relationships of the 
a 3 sequences follow the molecular phylogeny of the extant 
Carnivora (Flynn et al. 2005; Fulton and Strobeck 2006), 
in which Musteloidea, Pinnipedia, and Ursoidea together 
form the infraorder Arctoidea. This infraorder then groups 
with Canidae to form the suborder Caniformia and together 
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with Feliformia comprise the order Carnivora. An exception 
is P. vitulina that forms a clade with A. melanoleuca in- 
stead of M. meles. Musteloidea diverged from Ursoidea and 
Pinnipedia around 36 and 35.5 million years ago, respec- 
tively (Bininda-Emonds et al. 1999). The proximity of these 
divergence times might lead to incomplete lineage sorting 
(Maddison and Knowle 2006), which is common in highly 
polymorphic genes (Lu 2001). 

Conclusions 

Our findings highlight the importance of examining gene 
regions separately in order to obtain a more comprehensive 
understanding of MHC evolution. While the MHC class II 
genes (e.g., DRB and DQB; Sin et al. 2012) in M. meles did 
not exhibit extensive trans-species polymorphism, it never- 
theless was observed. The class II genes are therefore likely 
to be undergoing balancing selection, whereas phylogenetic 
inconsistency between the a 1 and allai domains of the class 
I genes indicates that these domains have different evolution- 
ary histories. Concerted evolution provides a more plausible 
hypothesis than birth-and-death evolution in the al and a3 
domains, given the intron 2 breakpoint and separate evolu- 
tionary history of the a 1 domain. Probable recombination 
has homogenized the al and a3 encoding exons, after ances- 
tral gene duplication. As the al domain is antigen binding, 
whereas a3 is not, these domains were subject to different 
selective pressures leading to the observed difference in the 
degree of sequence divergence over evolutionary time. Sep- 
arated from the al and a 3 encoding exons, the al domain 
is likely to be subject to balancing selection (Nei et al. 1997). 
The high number of pseudogenes we found is typical in class 
I loci, which often show a high turnover rate, generating 
an abundant number of pseudogenes with various degrees 
of divergence from their functional counterparts (Hughes 
1995; Cadavid et al. 1996; Beck et al. 1999; Piontkivska and 
Nei 2003). Nonfunctional sequences could serve as a reser- 
voir for genetic exchange (Ohta 1991; de Groot et al. 2004), 
which would be the case here if the recombination event hap- 
pened after the parental sequence Meme-MHC I*PS08N lost 
its function. 

This is the first study to characterize MHC class I genes 
in a species within the superfamily Musteloidea. Given the 
polygynandrous mating system of M. meles, where high levels 
of extra-group paternity are observed (Dugdale et al. 2007), 
further studies of mate choice and MHC would be informa- 
tive. Examination of the association between MHC genotypes 
and pathogens could also have significant implications for re- 
search into bTB epidemiology in badgers (Allen et al. 2010). 
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